HCLBLAST for Genome Sequence Matching

نویسندگان

  • Monika Yadav
  • Sonal Chaudhary
  • S. G. Manikandan
چکیده

Genome sequence matching is used to reveal biological information hidden in the DNA sequences and genome sequences. The main objective is to find whether the given sequence is like other sequence or not. To find the similarity between the diseases and intensity of the disease DNA sequences are matched. There is large number of sequences and the database is still growing. Given a genome sequence and to find matching sequences from the complete database is a big challenge. The genome sequence matching algorithms are also computation intensive like BLAST; which performs large number of string matching operations. So to handle this genome sequence matching algorithms and to store data which is Big data; Hadoop is used. Hadoop is a parallel processing Big data framework. The genome sequence database can be stored on Hadoop distributed filesystem. And then can be efficient;y processed using Map/Reduce. The data is distributed in the form of blocks and for every block an instance of mapper is mapped to process the block and then output of all the mappers is combined by reducer. This Map/Reduce process has inter-node parallelism. To further speedup the process and to efficiently utilize the resources like Central processing unit and Graphical processing unit,

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

Regulatory Sequence Analysis Tools

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative forma...

متن کامل

Independence of color intensity variation in red flesh apples from the number of repeat units in promoter region of the MdMYB10 gene as an allele to MdMYB1 and MdMYBA

MdMYB10 gene expression results in accumulation of anthocyanin in many tissues including flesh of applefruit. The MdMYB1 and MdMYBA genes are close homologues to MdMYB10 gene and both are responsiblefor red color phenotype in apple fruit skin. In the current study, an apple genome sequence draft analysisindicated that these three genes are located in a unique contig. Further a...

متن کامل

mpscan: Fast Localisation of Multiple Reads in Genomes

With Next Generation Sequencers, sequence based transcriptomic or epigenomic assays yield millions of short sequence reads that need to be mapped back on a reference genome. The upcoming versions of these sequencers promise even higher sequencing capacities; this may turn the read mapping task into a bottleneck for which alternative pattern matching approaches must be experimented. We present a...

متن کامل

Detection and Characterization of Weissellicin 110, a Bacteriocin Produced by Weissella cibaria

Background: Weissellicin 110 is the only bacteriocin reported in Weissella cibaria up to now. This bacteriocin represents several unique features. This is the first report on the gene sequence that encodes for the bacteriocin. Objectives: Providing a rapid detection method to isolate the weissellicin 110 encoding gene and determination of the bacteriocin distribution were the objectives. Materi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017